Computer Vision Project Solution¶

Problem Statement¶

Context¶

Company X owns a movie application and repository that caters to movie streaming to millions of users on a subscription basis. The company wants to automate the process of cast and crew information in each scene from a movie such that when a user pauses on the movie and clicks on the cast information button, the app will show details of the actor in the scene. The company has in-house computer vision and multimedia experts who need to detect faces from screenshots of the movie scene.

Objective¶

Part A: To build a face detection system

Part B: To create an image dataset to be used by the AI team to build image classifier data

Part C: To build a face recognition system

Mount the drive¶

In [1]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Import libraries¶

In [2]:
import os
import sys
import random

import numpy as np
import pandas as pd
import tensorflow as tf


import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import cv2

from sklearn.model_selection import train_test_split
from keras.layers import (Input, Conv2D, MaxPooling2D, UpSampling2D, Concatenate,
                          ZeroPadding2D, Convolution2D, Flatten, Activation,
                          Conv2DTranspose, BatchNormalization, Dropout, Lambda, ReLU,Reshape)

from tensorflow.keras.models import Sequential

from tensorflow.python.keras import losses
from tensorflow.keras.losses import binary_crossentropy
from tensorflow.keras.backend import log, epsilon

from tensorflow.keras.applications.resnet import preprocess_input
from tensorflow.keras.applications import ResNet50

from keras.models import Model, load_model
from keras.optimizers import Adam

from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import Model


from sklearn.preprocessing import LabelEncoder
In [3]:
import warnings
warnings.filterwarnings('ignore')

PART I¶

Read ‘images.npy’.¶

In [4]:
# Load the image file of the dataset
data = np.load('/content/drive/My Drive/GL_CV/data/Images.npy',allow_pickle=True)
data.shape
Out[4]:
(393, 2)
In [5]:
#Check the first image shape
img = data[0][0]
img.shape
Out[5]:
(333, 650, 3)

Split the data into Features(X) & labels(Y). Unify shape of all the images.¶

In [6]:
#define image dimensions as per resnet input
IMAGE_WIDTH = 224
IMAGE_HEIGHT = 224
IMAGE_SIZE = (IMAGE_WIDTH , IMAGE_HEIGHT, 3)
In [7]:
#create zero arrays to hold original images - X and ground truth masks - y

y = np.zeros((int(data.shape[0]), IMAGE_WIDTH,IMAGE_HEIGHT))
X = np.zeros((int(data.shape[0]), IMAGE_WIDTH, IMAGE_HEIGHT, 3))

for index in range(data.shape[0]):

    img = data[index][0]
    img = cv2.resize(img, dsize=(IMAGE_WIDTH, IMAGE_HEIGHT),interpolation = cv2.INTER_CUBIC)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    X[index] = preprocess_input(img)

    for data_list in data[index][1]:
          x1 = int(data_list["points"][0]['x'] * IMAGE_WIDTH)
          x2 = int(data_list["points"][1]['x'] * IMAGE_WIDTH)
          y1 = int(data_list["points"][0]['y'] * IMAGE_HEIGHT)
          y2 = int(data_list["points"][1]['y'] * IMAGE_HEIGHT)
          y[index][y1:y2, x1:x2] = 1
In [7]:

Split the data into train and test¶

In [8]:
# FIrst 350 images will be train data and remaining will be test data


X_train = X[:350]
y_train = y[:350]

X_test = X[350:]
y_test = y[350:]
In [9]:
X_train.shape
Out[9]:
(350, 224, 224, 3)
In [10]:
X_test.shape
Out[10]:
(43, 224, 224, 3)

Select random image from the train data and display original image and masked image.¶

In [11]:
import random
index = random.randrange(0, 300)

original_image = data[index][0]  # The original image from data
original_image = cv2.resize(original_image, dsize=(IMAGE_HEIGHT, IMAGE_WIDTH), interpolation=cv2.INTER_CUBIC)
original_image = original_image[:, :, :3]  # Ensure it has 3 channels


# Retrieve the corresponding mask and ensure it's resized
mask = cv2.resize(y[index], dsize=(IMAGE_HEIGHT, IMAGE_WIDTH), interpolation=cv2.INTER_NEAREST)

fig,(ax1,ax2) = plt.subplots(1,2,figsize=(10,8))

ax1.imshow(original_image.astype(np.uint8))
ax1.set_title(f'Original Image at index {index}')

plt.imshow(original_image.astype(np.uint8))  # Show the original image in the background
plt.imshow(mask, cmap='jet', alpha=0.5)      # Overlay the mask with transparency
ax2.set_title(f'Masked Image at index {index}')
ax1.axis('off')
ax2.axis('off')
plt.show()
No description has been provided for this image
In [11]:

Face mask detection model¶

We will build a Face detection model based on UNET architecture by using a pretrained ResNet model and finetuning it.

  • The encoders in the UNET will be built based on the outputs of ResNet model
  • The decoders in the UNET will be FCN that will be concatenated with the corresponding ResNet outputs (skip connections)
  • Final model will have:
    • input shape as 224,224,3 - same as ResNet
    • output shape as (None, 224, 224) - as needed to calculate Dice Coefficient during training

Model Train options

  • Number of epochs - 20
  • Batch size - 8
  • Optimiser - Adam with learning rate = 0.0001
  • Loss - Binary Cross Entropy + Dice Loss
  • Metrics - Dice Coefficient
  • Callbacks - Early Stopping, Model Checkpoint, Reduce Learning Rate on Plateau

Helper methods¶

In [12]:
def build_conv_layer(input_img, filters=64):
    # Taking first input and implementing the first conv block
    conv1 = Conv2D(filters, kernel_size = (3,3), padding = "same")(input_img)
    batch_norm1 = BatchNormalization()(conv1)
    act1 = ReLU()(batch_norm1)

    # Taking first input and implementing the second conv block
    conv2 = Conv2D(filters, kernel_size = (3,3), padding = "same")(act1)
    batch_norm2 = BatchNormalization()(conv2)
    act2 = ReLU()(batch_norm2)

    return act2


def build_decoder_layer(input_img, skip, filters=64):
    # Upsampling and concatenating the essential features
    conv_transpose = Conv2DTranspose(filters, (2, 2), strides=2, padding="same")(input_img)
    skip_layer = Concatenate()([conv_transpose, skip])
    out = build_conv_layer(skip_layer, filters)
    return out
In [12]:

In [13]:
def build_resnet_unet(input_shape):


    # Pre-trained ResNet50 Model with ImageNet weights
    resnet50 = ResNet50(include_top=False, weights="imagenet", input_shape = (IMAGE_WIDTH,IMAGE_HEIGHT,3))

    # Encoders
    s1 = resnet50.input
    s2 = resnet50.get_layer("conv1_relu").output
    s3 = resnet50.get_layer("conv2_block3_out").output
    s4 = resnet50.get_layer("conv3_block4_out").output

    # Bridge
    b1 = resnet50.get_layer("conv4_block6_out").output

    # Decoders
    d1 = build_decoder_layer(b1, s4, 1024)
    d2 = build_decoder_layer(d1, s3, 512)
    d3 = build_decoder_layer(d2, s2, 256)
    d4 = build_decoder_layer(d3, s1, 64)

    # Output
    out1 = Conv2D(1, 1, padding="same", activation="sigmoid")(d4)
    outputs = Reshape((IMAGE_HEIGHT, IMAGE_WIDTH))(out1)

    model = Model(resnet50.input, outputs, name="ResNet50_U-Net")
    return model
In [14]:
model = build_resnet_unet(IMAGE_SIZE)
model.summary()
Model: "ResNet50_U-Net"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)              ┃ Output Shape           ┃        Param # ┃ Connected to           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer)  │ (None, 224, 224, 3)    │              0 │ -                      │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv1_pad (ZeroPadding2D) │ (None, 230, 230, 3)    │              0 │ input_layer[0][0]      │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv1_conv (Conv2D)       │ (None, 112, 112, 64)   │          9,472 │ conv1_pad[0][0]        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv1_bn                  │ (None, 112, 112, 64)   │            256 │ conv1_conv[0][0]       │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv1_relu (Activation)   │ (None, 112, 112, 64)   │              0 │ conv1_bn[0][0]         │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ pool1_pad (ZeroPadding2D) │ (None, 114, 114, 64)   │              0 │ conv1_relu[0][0]       │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ pool1_pool (MaxPooling2D) │ (None, 56, 56, 64)     │              0 │ pool1_pad[0][0]        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_1_conv       │ (None, 56, 56, 64)     │          4,160 │ pool1_pool[0][0]       │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_1_bn         │ (None, 56, 56, 64)     │            256 │ conv2_block1_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_1_relu       │ (None, 56, 56, 64)     │              0 │ conv2_block1_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_2_conv       │ (None, 56, 56, 64)     │         36,928 │ conv2_block1_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_2_bn         │ (None, 56, 56, 64)     │            256 │ conv2_block1_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_2_relu       │ (None, 56, 56, 64)     │              0 │ conv2_block1_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_0_conv       │ (None, 56, 56, 256)    │         16,640 │ pool1_pool[0][0]       │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_3_conv       │ (None, 56, 56, 256)    │         16,640 │ conv2_block1_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_0_bn         │ (None, 56, 56, 256)    │          1,024 │ conv2_block1_0_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_3_bn         │ (None, 56, 56, 256)    │          1,024 │ conv2_block1_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_add (Add)    │ (None, 56, 56, 256)    │              0 │ conv2_block1_0_bn[0][… │
│                           │                        │                │ conv2_block1_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block1_out          │ (None, 56, 56, 256)    │              0 │ conv2_block1_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block2_1_conv       │ (None, 56, 56, 64)     │         16,448 │ conv2_block1_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block2_1_bn         │ (None, 56, 56, 64)     │            256 │ conv2_block2_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block2_1_relu       │ (None, 56, 56, 64)     │              0 │ conv2_block2_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block2_2_conv       │ (None, 56, 56, 64)     │         36,928 │ conv2_block2_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block2_2_bn         │ (None, 56, 56, 64)     │            256 │ conv2_block2_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block2_2_relu       │ (None, 56, 56, 64)     │              0 │ conv2_block2_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block2_3_conv       │ (None, 56, 56, 256)    │         16,640 │ conv2_block2_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block2_3_bn         │ (None, 56, 56, 256)    │          1,024 │ conv2_block2_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block2_add (Add)    │ (None, 56, 56, 256)    │              0 │ conv2_block1_out[0][0… │
│                           │                        │                │ conv2_block2_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block2_out          │ (None, 56, 56, 256)    │              0 │ conv2_block2_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block3_1_conv       │ (None, 56, 56, 64)     │         16,448 │ conv2_block2_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block3_1_bn         │ (None, 56, 56, 64)     │            256 │ conv2_block3_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block3_1_relu       │ (None, 56, 56, 64)     │              0 │ conv2_block3_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block3_2_conv       │ (None, 56, 56, 64)     │         36,928 │ conv2_block3_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block3_2_bn         │ (None, 56, 56, 64)     │            256 │ conv2_block3_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block3_2_relu       │ (None, 56, 56, 64)     │              0 │ conv2_block3_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block3_3_conv       │ (None, 56, 56, 256)    │         16,640 │ conv2_block3_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block3_3_bn         │ (None, 56, 56, 256)    │          1,024 │ conv2_block3_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block3_add (Add)    │ (None, 56, 56, 256)    │              0 │ conv2_block2_out[0][0… │
│                           │                        │                │ conv2_block3_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2_block3_out          │ (None, 56, 56, 256)    │              0 │ conv2_block3_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_1_conv       │ (None, 28, 28, 128)    │         32,896 │ conv2_block3_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_1_bn         │ (None, 28, 28, 128)    │            512 │ conv3_block1_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_1_relu       │ (None, 28, 28, 128)    │              0 │ conv3_block1_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_2_conv       │ (None, 28, 28, 128)    │        147,584 │ conv3_block1_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_2_bn         │ (None, 28, 28, 128)    │            512 │ conv3_block1_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_2_relu       │ (None, 28, 28, 128)    │              0 │ conv3_block1_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_0_conv       │ (None, 28, 28, 512)    │        131,584 │ conv2_block3_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_3_conv       │ (None, 28, 28, 512)    │         66,048 │ conv3_block1_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_0_bn         │ (None, 28, 28, 512)    │          2,048 │ conv3_block1_0_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_3_bn         │ (None, 28, 28, 512)    │          2,048 │ conv3_block1_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_add (Add)    │ (None, 28, 28, 512)    │              0 │ conv3_block1_0_bn[0][… │
│                           │                        │                │ conv3_block1_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block1_out          │ (None, 28, 28, 512)    │              0 │ conv3_block1_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block2_1_conv       │ (None, 28, 28, 128)    │         65,664 │ conv3_block1_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block2_1_bn         │ (None, 28, 28, 128)    │            512 │ conv3_block2_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block2_1_relu       │ (None, 28, 28, 128)    │              0 │ conv3_block2_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block2_2_conv       │ (None, 28, 28, 128)    │        147,584 │ conv3_block2_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block2_2_bn         │ (None, 28, 28, 128)    │            512 │ conv3_block2_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block2_2_relu       │ (None, 28, 28, 128)    │              0 │ conv3_block2_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block2_3_conv       │ (None, 28, 28, 512)    │         66,048 │ conv3_block2_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block2_3_bn         │ (None, 28, 28, 512)    │          2,048 │ conv3_block2_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block2_add (Add)    │ (None, 28, 28, 512)    │              0 │ conv3_block1_out[0][0… │
│                           │                        │                │ conv3_block2_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block2_out          │ (None, 28, 28, 512)    │              0 │ conv3_block2_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block3_1_conv       │ (None, 28, 28, 128)    │         65,664 │ conv3_block2_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block3_1_bn         │ (None, 28, 28, 128)    │            512 │ conv3_block3_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block3_1_relu       │ (None, 28, 28, 128)    │              0 │ conv3_block3_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block3_2_conv       │ (None, 28, 28, 128)    │        147,584 │ conv3_block3_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block3_2_bn         │ (None, 28, 28, 128)    │            512 │ conv3_block3_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block3_2_relu       │ (None, 28, 28, 128)    │              0 │ conv3_block3_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block3_3_conv       │ (None, 28, 28, 512)    │         66,048 │ conv3_block3_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block3_3_bn         │ (None, 28, 28, 512)    │          2,048 │ conv3_block3_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block3_add (Add)    │ (None, 28, 28, 512)    │              0 │ conv3_block2_out[0][0… │
│                           │                        │                │ conv3_block3_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block3_out          │ (None, 28, 28, 512)    │              0 │ conv3_block3_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block4_1_conv       │ (None, 28, 28, 128)    │         65,664 │ conv3_block3_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block4_1_bn         │ (None, 28, 28, 128)    │            512 │ conv3_block4_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block4_1_relu       │ (None, 28, 28, 128)    │              0 │ conv3_block4_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block4_2_conv       │ (None, 28, 28, 128)    │        147,584 │ conv3_block4_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block4_2_bn         │ (None, 28, 28, 128)    │            512 │ conv3_block4_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block4_2_relu       │ (None, 28, 28, 128)    │              0 │ conv3_block4_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block4_3_conv       │ (None, 28, 28, 512)    │         66,048 │ conv3_block4_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block4_3_bn         │ (None, 28, 28, 512)    │          2,048 │ conv3_block4_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block4_add (Add)    │ (None, 28, 28, 512)    │              0 │ conv3_block3_out[0][0… │
│                           │                        │                │ conv3_block4_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv3_block4_out          │ (None, 28, 28, 512)    │              0 │ conv3_block4_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_1_conv       │ (None, 14, 14, 256)    │        131,328 │ conv3_block4_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_1_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block1_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_1_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block1_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_2_conv       │ (None, 14, 14, 256)    │        590,080 │ conv4_block1_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_2_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block1_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_2_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block1_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_0_conv       │ (None, 14, 14, 1024)   │        525,312 │ conv3_block4_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_3_conv       │ (None, 14, 14, 1024)   │        263,168 │ conv4_block1_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_0_bn         │ (None, 14, 14, 1024)   │          4,096 │ conv4_block1_0_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_3_bn         │ (None, 14, 14, 1024)   │          4,096 │ conv4_block1_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_add (Add)    │ (None, 14, 14, 1024)   │              0 │ conv4_block1_0_bn[0][… │
│                           │                        │                │ conv4_block1_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block1_out          │ (None, 14, 14, 1024)   │              0 │ conv4_block1_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block2_1_conv       │ (None, 14, 14, 256)    │        262,400 │ conv4_block1_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block2_1_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block2_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block2_1_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block2_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block2_2_conv       │ (None, 14, 14, 256)    │        590,080 │ conv4_block2_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block2_2_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block2_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block2_2_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block2_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block2_3_conv       │ (None, 14, 14, 1024)   │        263,168 │ conv4_block2_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block2_3_bn         │ (None, 14, 14, 1024)   │          4,096 │ conv4_block2_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block2_add (Add)    │ (None, 14, 14, 1024)   │              0 │ conv4_block1_out[0][0… │
│                           │                        │                │ conv4_block2_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block2_out          │ (None, 14, 14, 1024)   │              0 │ conv4_block2_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block3_1_conv       │ (None, 14, 14, 256)    │        262,400 │ conv4_block2_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block3_1_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block3_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block3_1_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block3_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block3_2_conv       │ (None, 14, 14, 256)    │        590,080 │ conv4_block3_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block3_2_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block3_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block3_2_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block3_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block3_3_conv       │ (None, 14, 14, 1024)   │        263,168 │ conv4_block3_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block3_3_bn         │ (None, 14, 14, 1024)   │          4,096 │ conv4_block3_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block3_add (Add)    │ (None, 14, 14, 1024)   │              0 │ conv4_block2_out[0][0… │
│                           │                        │                │ conv4_block3_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block3_out          │ (None, 14, 14, 1024)   │              0 │ conv4_block3_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block4_1_conv       │ (None, 14, 14, 256)    │        262,400 │ conv4_block3_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block4_1_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block4_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block4_1_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block4_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block4_2_conv       │ (None, 14, 14, 256)    │        590,080 │ conv4_block4_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block4_2_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block4_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block4_2_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block4_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block4_3_conv       │ (None, 14, 14, 1024)   │        263,168 │ conv4_block4_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block4_3_bn         │ (None, 14, 14, 1024)   │          4,096 │ conv4_block4_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block4_add (Add)    │ (None, 14, 14, 1024)   │              0 │ conv4_block3_out[0][0… │
│                           │                        │                │ conv4_block4_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block4_out          │ (None, 14, 14, 1024)   │              0 │ conv4_block4_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block5_1_conv       │ (None, 14, 14, 256)    │        262,400 │ conv4_block4_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block5_1_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block5_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block5_1_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block5_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block5_2_conv       │ (None, 14, 14, 256)    │        590,080 │ conv4_block5_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block5_2_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block5_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block5_2_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block5_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block5_3_conv       │ (None, 14, 14, 1024)   │        263,168 │ conv4_block5_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block5_3_bn         │ (None, 14, 14, 1024)   │          4,096 │ conv4_block5_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block5_add (Add)    │ (None, 14, 14, 1024)   │              0 │ conv4_block4_out[0][0… │
│                           │                        │                │ conv4_block5_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block5_out          │ (None, 14, 14, 1024)   │              0 │ conv4_block5_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block6_1_conv       │ (None, 14, 14, 256)    │        262,400 │ conv4_block5_out[0][0] │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block6_1_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block6_1_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block6_1_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block6_1_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block6_2_conv       │ (None, 14, 14, 256)    │        590,080 │ conv4_block6_1_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block6_2_bn         │ (None, 14, 14, 256)    │          1,024 │ conv4_block6_2_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block6_2_relu       │ (None, 14, 14, 256)    │              0 │ conv4_block6_2_bn[0][… │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block6_3_conv       │ (None, 14, 14, 1024)   │        263,168 │ conv4_block6_2_relu[0… │
│ (Conv2D)                  │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block6_3_bn         │ (None, 14, 14, 1024)   │          4,096 │ conv4_block6_3_conv[0… │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block6_add (Add)    │ (None, 14, 14, 1024)   │              0 │ conv4_block5_out[0][0… │
│                           │                        │                │ conv4_block6_3_bn[0][… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv4_block6_out          │ (None, 14, 14, 1024)   │              0 │ conv4_block6_add[0][0] │
│ (Activation)              │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_transpose          │ (None, 28, 28, 1024)   │      4,195,328 │ conv4_block6_out[0][0] │
│ (Conv2DTranspose)         │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ concatenate (Concatenate) │ (None, 28, 28, 1536)   │              0 │ conv2d_transpose[0][0… │
│                           │                        │                │ conv3_block4_out[0][0] │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d (Conv2D)           │ (None, 28, 28, 1024)   │     14,156,800 │ concatenate[0][0]      │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ batch_normalization       │ (None, 28, 28, 1024)   │          4,096 │ conv2d[0][0]           │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ re_lu (ReLU)              │ (None, 28, 28, 1024)   │              0 │ batch_normalization[0… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_1 (Conv2D)         │ (None, 28, 28, 1024)   │      9,438,208 │ re_lu[0][0]            │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ batch_normalization_1     │ (None, 28, 28, 1024)   │          4,096 │ conv2d_1[0][0]         │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ re_lu_1 (ReLU)            │ (None, 28, 28, 1024)   │              0 │ batch_normalization_1… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_transpose_1        │ (None, 56, 56, 512)    │      2,097,664 │ re_lu_1[0][0]          │
│ (Conv2DTranspose)         │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ concatenate_1             │ (None, 56, 56, 768)    │              0 │ conv2d_transpose_1[0]… │
│ (Concatenate)             │                        │                │ conv2_block3_out[0][0] │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_2 (Conv2D)         │ (None, 56, 56, 512)    │      3,539,456 │ concatenate_1[0][0]    │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ batch_normalization_2     │ (None, 56, 56, 512)    │          2,048 │ conv2d_2[0][0]         │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ re_lu_2 (ReLU)            │ (None, 56, 56, 512)    │              0 │ batch_normalization_2… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_3 (Conv2D)         │ (None, 56, 56, 512)    │      2,359,808 │ re_lu_2[0][0]          │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ batch_normalization_3     │ (None, 56, 56, 512)    │          2,048 │ conv2d_3[0][0]         │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ re_lu_3 (ReLU)            │ (None, 56, 56, 512)    │              0 │ batch_normalization_3… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_transpose_2        │ (None, 112, 112, 256)  │        524,544 │ re_lu_3[0][0]          │
│ (Conv2DTranspose)         │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ concatenate_2             │ (None, 112, 112, 320)  │              0 │ conv2d_transpose_2[0]… │
│ (Concatenate)             │                        │                │ conv1_relu[0][0]       │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_4 (Conv2D)         │ (None, 112, 112, 256)  │        737,536 │ concatenate_2[0][0]    │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ batch_normalization_4     │ (None, 112, 112, 256)  │          1,024 │ conv2d_4[0][0]         │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ re_lu_4 (ReLU)            │ (None, 112, 112, 256)  │              0 │ batch_normalization_4… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_5 (Conv2D)         │ (None, 112, 112, 256)  │        590,080 │ re_lu_4[0][0]          │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ batch_normalization_5     │ (None, 112, 112, 256)  │          1,024 │ conv2d_5[0][0]         │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ re_lu_5 (ReLU)            │ (None, 112, 112, 256)  │              0 │ batch_normalization_5… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_transpose_3        │ (None, 224, 224, 64)   │         65,600 │ re_lu_5[0][0]          │
│ (Conv2DTranspose)         │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ concatenate_3             │ (None, 224, 224, 67)   │              0 │ conv2d_transpose_3[0]… │
│ (Concatenate)             │                        │                │ input_layer[0][0]      │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_6 (Conv2D)         │ (None, 224, 224, 64)   │         38,656 │ concatenate_3[0][0]    │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ batch_normalization_6     │ (None, 224, 224, 64)   │            256 │ conv2d_6[0][0]         │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ re_lu_6 (ReLU)            │ (None, 224, 224, 64)   │              0 │ batch_normalization_6… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_7 (Conv2D)         │ (None, 224, 224, 64)   │         36,928 │ re_lu_6[0][0]          │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ batch_normalization_7     │ (None, 224, 224, 64)   │            256 │ conv2d_7[0][0]         │
│ (BatchNormalization)      │                        │                │                        │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ re_lu_7 (ReLU)            │ (None, 224, 224, 64)   │              0 │ batch_normalization_7… │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ conv2d_8 (Conv2D)         │ (None, 224, 224, 1)    │             65 │ re_lu_7[0][0]          │
├───────────────────────────┼────────────────────────┼────────────────┼────────────────────────┤
│ reshape (Reshape)         │ (None, 224, 224)       │              0 │ conv2d_8[0][0]         │
└───────────────────────────┴────────────────────────┴────────────────┴────────────────────────┘
 Total params: 46,384,705 (176.94 MB)
 Trainable params: 46,346,689 (176.80 MB)
 Non-trainable params: 38,016 (148.50 KB)
In [15]:
tf.keras.utils.plot_model(model, "./model.png", show_shapes=False, show_dtype=False, show_layer_names=True, rankdir='TB', expand_nested=False, dpi=96)
Out[15]:
No description has been provided for this image

Model Train¶

We will use Dice Coefficient as a metric to evaluate during training the model.

For the loss, we will consider 2 losses : binary_crossentropy and dice loss (log of Dice Coefficient metric)

In [16]:
def dice_coefficient(y_true, y_pred):
    '''
    Calculate Dice Coefficient between ground truth masks and predicted masks
    '''
    # Check if y_pred has the channel dimension
    if len(y_pred.shape) == 3:
        # Add a channel dimension to y_pred
        y_pred = tf.expand_dims(y_pred, axis=-1)

     # Add a channel dimension to y_true if it doesn't have one
    if len(y_true.shape) == 3:
        y_true = tf.expand_dims(y_true, axis=-1)

    numerator = 2 * tf.reduce_sum(y_true * y_pred)
    denominator = tf.reduce_sum(y_true + y_pred)

    return numerator / (denominator + tf.keras.backend.epsilon())


def loss(y_true, y_pred):

    '''
    Calculate the loss between ground truth masks and predicted masks
    Considers 2 types of loss: Binary Cross Entropy and Dice Loss
    '''
    return binary_crossentropy(y_true, y_pred) - log(dice_coefficient(y_true, y_pred) + epsilon())
In [17]:
#Compile the model using Adam optimiser and custom metrics and loss functions as defined above
adam_opt = Adam(learning_rate = 0.0001)
model.compile(optimizer=adam_opt,loss= loss, metrics=[dice_coefficient])

batch_size = 8
epochs = 20

early_stop = EarlyStopping(monitor="loss", patience=5, mode="min")
checkpoint = ModelCheckpoint("/content/drive/My Drive/GL_CV/models/model-{loss:.2f}.weights.h5", monitor="loss", verbose=1, save_best_only=True,
                             save_weights_only=True, mode="min", save_freq=1)
reduce_lr = ReduceLROnPlateau(monitor="loss", factor=0.2, patience=3, min_lr=1e-6, verbose=1, mode="min")


history = model.fit(X_train,y_train,batch_size=batch_size,
                    epochs=epochs,
                    callbacks=[early_stop],
                    validation_split=0.1,
                    verbose = 1)
Epoch 1/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 139s 1s/step - dice_coefficient: 0.2430 - loss: 2.1920 - val_dice_coefficient: 0.5108 - val_loss: 2.1255
Epoch 2/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 10s 240ms/step - dice_coefficient: 0.4601 - loss: 1.2064 - val_dice_coefficient: 0.3729 - val_loss: 2.4529
Epoch 3/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 10s 242ms/step - dice_coefficient: 0.5543 - loss: 0.8755 - val_dice_coefficient: 0.4087 - val_loss: 2.2275
Epoch 4/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 10s 242ms/step - dice_coefficient: 0.6340 - loss: 0.6559 - val_dice_coefficient: 0.4515 - val_loss: 2.2620
Epoch 5/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 10s 240ms/step - dice_coefficient: 0.6966 - loss: 0.5245 - val_dice_coefficient: 0.4936 - val_loss: 1.3259
Epoch 6/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 237ms/step - dice_coefficient: 0.7320 - loss: 0.4381 - val_dice_coefficient: 0.5251 - val_loss: 1.1395
Epoch 7/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 236ms/step - dice_coefficient: 0.7341 - loss: 0.4358 - val_dice_coefficient: 0.5329 - val_loss: 1.1528
Epoch 8/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 235ms/step - dice_coefficient: 0.7717 - loss: 0.3706 - val_dice_coefficient: 0.5394 - val_loss: 1.0322
Epoch 9/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 235ms/step - dice_coefficient: 0.7878 - loss: 0.3328 - val_dice_coefficient: 0.5569 - val_loss: 0.8653
Epoch 10/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 235ms/step - dice_coefficient: 0.8018 - loss: 0.3023 - val_dice_coefficient: 0.5737 - val_loss: 0.7994
Epoch 11/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 235ms/step - dice_coefficient: 0.8108 - loss: 0.2862 - val_dice_coefficient: 0.5727 - val_loss: 0.7869
Epoch 12/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 235ms/step - dice_coefficient: 0.8258 - loss: 0.2640 - val_dice_coefficient: 0.6209 - val_loss: 0.7160
Epoch 13/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 237ms/step - dice_coefficient: 0.8022 - loss: 0.3051 - val_dice_coefficient: 0.5833 - val_loss: 0.8069
Epoch 14/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 10s 238ms/step - dice_coefficient: 0.8385 - loss: 0.2384 - val_dice_coefficient: 0.6097 - val_loss: 0.7307
Epoch 15/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 10s 238ms/step - dice_coefficient: 0.8611 - loss: 0.2058 - val_dice_coefficient: 0.6162 - val_loss: 0.7188
Epoch 16/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 10s 238ms/step - dice_coefficient: 0.8675 - loss: 0.1958 - val_dice_coefficient: 0.6223 - val_loss: 0.6985
Epoch 17/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 10s 238ms/step - dice_coefficient: 0.8642 - loss: 0.1985 - val_dice_coefficient: 0.6217 - val_loss: 0.7119
Epoch 18/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 237ms/step - dice_coefficient: 0.8841 - loss: 0.1678 - val_dice_coefficient: 0.6084 - val_loss: 0.7113
Epoch 19/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 236ms/step - dice_coefficient: 0.8670 - loss: 0.1881 - val_dice_coefficient: 0.6286 - val_loss: 0.7061
Epoch 20/20
40/40 ━━━━━━━━━━━━━━━━━━━━ 9s 236ms/step - dice_coefficient: 0.8912 - loss: 0.1567 - val_dice_coefficient: 0.6324 - val_loss: 0.7039

Model Evaluation¶

In [18]:
fig,(ax1,ax3) = plt.subplots(1,2,figsize=(10,5))

ax1.plot(history.history['loss'], label='loss')
ax1.plot(history.history['val_loss'], label = 'val_loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.legend(loc='upper left')
#ax1.ylim([0.0, 0.8])

ax3.plot(history.history['dice_coefficient'], label='dice_coefficient')
ax3.plot(history.history['val_dice_coefficient'], label = 'val_dice_coefficient')
ax3.set_xlabel('Epoch')
ax3.set_ylabel('Dice Coefficient')
ax3.legend(loc='lower right');
#ax3.ylim([0.5, 1])

plt.tight_layout()
plt.show()
No description has been provided for this image
In [18]:

Dice Coefficient on all Test images¶

In [19]:
def calc_dice_coeff():

  '''
    Helper method to calculate average dice coefficient for all test images

  '''

  dice_coeff = []

  for index in range(X_test.shape[0]):

      image = data[index+350][0].copy()
      image = cv2.resize(image, dsize=(IMAGE_HEIGHT, IMAGE_WIDTH), interpolation=cv2.INTER_CUBIC)
      feat_scaled = preprocess_input(np.array(image, dtype=np.float32))

      # Retrieve the corresponding mask and ensure it's resized
      mask = cv2.resize(y[index+350], dsize=(IMAGE_HEIGHT, IMAGE_WIDTH), interpolation=cv2.INTER_NEAREST)


      pred_mask = cv2.resize(1.0*(model.predict(x=np.array([feat_scaled]),verbose=0)[0] > 0.2), (IMAGE_WIDTH,IMAGE_HEIGHT))
      dice_coeff.append(dice_coefficient(mask,pred_mask))

  return np.mean(dice_coeff)
In [20]:
print(f'Dice coefficient on all test images: {calc_dice_coeff()}')
Dice coefficient on all test images: 0.6195158875598373

Test the model predictions on the test images¶

We will test the model against a few test images and display the predicted masks along with the original image and true masks.

In [21]:
def predict_mask(image_index):

  '''
    Helper method to predict and display the orginal,truth and predicted masks
    Index specified is as per orginal data set, so value should be between 351 and 393

  '''

  if image_index <= 350 or image_index >= 394:
    print("Invalid Index")
    return

  fig, (ax1,ax2,ax3) = plt.subplots(1,3,figsize=(10,10))

  # Original image
  image = data[image_index][0].copy()
  image = cv2.resize(image, dsize=(IMAGE_HEIGHT, IMAGE_WIDTH), interpolation=cv2.INTER_CUBIC)
  feat_scaled = preprocess_input(np.array(image, dtype=np.float32))

  # Retrieve the corresponding mask and ensure it's resized
  mask = cv2.resize(y[image_index], dsize=(IMAGE_HEIGHT, IMAGE_WIDTH), interpolation=cv2.INTER_NEAREST)


  pred_mask = cv2.resize(1.0*(model.predict(x=np.array([feat_scaled]),verbose=0)[0] > 0.2), (IMAGE_WIDTH,IMAGE_HEIGHT))


  ax1.imshow(image)
  ax1.set_title("Original Image")


  ax2.imshow(image.astype(np.uint8))  # Show the original image in the background
  ax2.imshow(mask, cmap='jet', alpha=0.5)      # Overlay the mask with transparency
  ax2.set_title("Original Mask")


  ax3.imshow(image.astype(np.uint8))  # Show the original image in the background
  ax3.imshow(pred_mask, cmap='jet', alpha=0.5)      # Overlay the predicted mask with transparency
  ax3.set_title("Predicted Mask")

  ax1.axis('off')
  ax2.axis('off')
  ax3.axis('off')
  plt.show()
In [22]:
predict_mask(380)
No description has been provided for this image
In [23]:
predict_mask(390)
No description has been provided for this image

Test the model on the given test images¶

In [24]:
def predict_mask_image(image_path):

  '''
    Helper method to predict and display the orginal,truth and predicted masks
    Index specified is as per orginal data set, so value should be between 351 and 393

  '''


  fig, (ax1,ax2) = plt.subplots(1,2,figsize=(8,8))

  # Original image
  image = cv2.imread(image_path)
  image = cv2.resize(image, dsize=(IMAGE_HEIGHT, IMAGE_WIDTH), interpolation=cv2.INTER_CUBIC)
  feat_scaled = preprocess_input(np.array(image, dtype=np.float32))

  pred_mask = cv2.resize(1.0*(model.predict(x=np.array([feat_scaled]),verbose=0)[0] > 0.2), (IMAGE_WIDTH,IMAGE_HEIGHT))


  ax1.imshow(image)
  ax1.set_title("Original Image")

  ax2.imshow(image.astype(np.uint8))  # Show the original image in the background
  ax2.imshow(pred_mask, cmap='jet', alpha=0.5)      # Overlay the predicted mask with transparency
  ax2.set_title("Predicted Mask")

  ax1.axis('off')
  ax2.axis('off')

  plt.show()
In [25]:
predict_mask_image("/content/drive/My Drive/GL_CV/data/Dwayne Johnson4.jpg")
No description has been provided for this image
In [26]:
predict_mask_image("/content/drive/My Drive/GL_CV/data/Benedict Cumberbatch9.jpg")
No description has been provided for this image

Insights on Performance¶

  • Validation Dice Coefficient is around 64% whereas Train Dice Coefficient is around 90%, indicates model is overfit to train data.
  • Overfit likely due to small size of images
  • Dice Coefficient on test images is around 63%
  • Since we are performing segmentation tasks, classification metrics are not considered in evaluating model performance
  • Model is identifying non-face with face-like characteristics as a face, more such samples are needed with proper masks to differentiate such images

Import images from folder ‘training_images’¶

Helper method¶

In [27]:
import zipfile
def unzip_files(zip_file_path, destination_folder):
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        zip_ref.extractall(destination_folder)

    print("Files extracted successfully.")

Read training_images.zip file¶

In [28]:
TRAINING_ZIP_FILEPATH = "/content/drive/My Drive/GL_CV/data/training_images.zip"
TRAINING_UNZIP_FOLDER = "/content/"

unzip_files(TRAINING_ZIP_FILEPATH, TRAINING_UNZIP_FOLDER)
Files extracted successfully.

Detect faces, extract metadata for the faces in all the images, and write and save it into a DataFrame¶

Approach

  • Define IdentityMetdata custom class, build a list of metadata - each element will belong to this class
  • Use a Cascade Classifier with predefined weights to get the image level details and save in different lists
  • Create a Dataframe with all above details as columns
In [29]:
# custom class to hold file name and file path for each file
class IdentityMetadata():
    def __init__(self, base, file, dir_name=''):

        # dataset base directory
        self.base = base
        # identity name
        self.name = dir_name
        # image file name
        self.file = file

    def __repr__(self):
        return self.image_path()

    def image_path(self):
        return os.path.join(self.base, self.name, self.file)
In [30]:
def load_metadata(path):

    '''
    Build list of metdata given the file path to save file name and file path
    for each file in the given path

    '''


    metadata = []

    #Traverse through the directory
    for item in os.listdir(path):


        curr_item = os.path.join(path, item)

        #if current item is directory,traverse through it
        # and find names of all files
        if os.path.isdir(curr_item):

            for file in os.listdir(os.path.join(path, curr_item)):
                # Check file extension. Allow only jpg/jpeg' files.
                ext = os.path.splitext(file)[1]
                if ext == '.jpg' or ext == '.jpeg':
                    metadata.append(IdentityMetadata(path, file,curr_item,))

        #add file name and path to metadata list
        else:
            ext = os.path.splitext(curr_item)[1]
            if ext == '.jpg' or ext == '.jpeg':
                metadata.append(IdentityMetadata(path, file=curr_item))
    return np.array(metadata)

Build File names Metadata¶

In [31]:
metadata = load_metadata("/content/training_images")

#No of files in training images
print(metadata.shape)
(1091,)
In [32]:
#Sample of metadata
metadata[0]
Out[32]:
/content/training_images/real_00501.jpg
In [33]:
metadata[0].image_path()
Out[33]:
'/content/training_images/real_00501.jpg'

Use Cascade Classifier to get details about faces in each image¶

In [34]:
# Create a cascade classfier object
face_cascade=cv2.CascadeClassifier("/content/drive/My Drive/GL_CV/conf/haarcascade_frontalface_default.xml")   # frontal face

#empty lists to hold the details
x_cord = []
y_cord = []
bb_height = []
bb_width = []
num_faces = []
image_name = []


#for each image in training_images folder
#predict the face details and add to lists
for i in range (len(metadata)):

    data_path = os.path.join(metadata[i].image_path())
    img = cv2.imread(data_path)
    faces = face_cascade.detectMultiScale(img,scaleFactor=1.05,minNeighbors=5)

    #number of faces
    j=0

    # Find coordinates of the face
    for x,y,w,h in faces:
        img=cv2.rectangle(img,(x,y),(x+w,y+h),(255, 0, 0),2)
        j=j+1

    resized=cv2.resize(img,(int(img.shape[1]/2),int(img.shape[0])))

    length=len(faces)
    if length==0:
        a=0
        b=0
        c=0
        d=0
        j=0
        name=0

    else:
        a=faces[0,0]
        b=faces[0,1]
        c=faces[0,2]
        d=faces[0,3]
        name=metadata[i].image_path()


    x_cord.append(a)
    y_cord.append(b)
    bb_width.append(c)
    bb_height.append(d)
    num_faces.append(j)
    image_name.append(name)

print(f'Metadata extracted for {len(metadata)} files')
Metadata extracted for 1091 files
In [35]:
df = pd.DataFrame(x_cord, columns = ['x'])

df['y']=y_cord
df['w']=bb_width
df['h']=bb_height
df['Total_Faces']=num_faces
df['Image_Name']=image_name

df.sample(5)
Out[35]:
x y w h Total_Faces Image_Name
681 71 87 432 432 1 /content/training_images/real_00028.jpg
342 145 146 431 431 2 /content/training_images/real_00495.jpg
734 120 83 430 430 1 /content/training_images/real_00578.jpg
155 76 70 498 498 2 /content/training_images/real_00743.jpg
580 461 22 102 102 2 /content/training_images/real_01001.jpg

PART II¶

Import the data ‘PINS.zip’¶

In [36]:
PINS_ZIP_FILEPATH = "/content/drive/My Drive/GL_CV/data/PINS.zip"
PINS_UNZIP_FOLDER = "/content/"

unzip_files(PINS_ZIP_FILEPATH, PINS_UNZIP_FOLDER)
Files extracted successfully.

Create Metadata for PINS folder¶

In [37]:
#check the shape of metadata
metadata = load_metadata("/content/PINS")
print(metadata.shape)
(10770,)
In [38]:
#check metadata for index 0
metadata[0]
Out[38]:
/content/PINS/pins_Emilia Clarke/Emilia Clarke160_1011.jpg

- Read the images and extract labels from the filenames for all the folders¶

In [39]:
#empty lists to old the image and labels
img_data = []
labels = []

#For each file in PINS folder
# put the image in img_data
# put the label (directory name) in labels
for i in range(len(metadata)):

  #read image and add to img_data
  img = cv2.imread(metadata[0].image_path(), cv2.IMREAD_COLOR)
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  img = cv2.resize(img, dsize=(IMAGE_HEIGHT, IMAGE_WIDTH), interpolation=cv2.INTER_CUBIC)
  img_data.append(img)

  #extract directory name and add to labels
  label = metadata[i].name.split('/')[-1].split('pins_')[1]
  labels.append(label)
In [40]:
print(f'Number of files in PINS folder: {len(img_data)}')
Number of files in PINS folder: 10770
In [41]:
print(f'Number of labels {len(labels)}')
Number of labels 10770
In [42]:
#visualise a sample image with its label
plt.imshow(img_data[0])
plt.title(labels[0])
plt.axis('off');
No description has been provided for this image

Generate embedding vectors for each image in the dataset¶

Approach

  • Use a pretrained VGG model to generate embedding vector for each file in PINS folder

  • Use Squared L2 Distance as the distance metrics

  • Use 2 thresholds:

    • Lower Threshold : Images with distance <= this threshold will be considered similar
    • Upper Threshold : Images with distance >= this threshold will be considered similar

Load VGG16 Pretrained Model¶

In [43]:
VGG_IMG_WIDTH = 224
VGG_IMG_HEIGHT = 224
VGG_IMG_SIZE = (VGG_IMG_WIDTH , VGG_IMG_HEIGHT, 3)
In [44]:
def vgg_face():

    '''
      Method to load the VGG model structure
      Use this along with weights file to create a VGG model instance


    '''


    model = Sequential()
    model.add(ZeroPadding2D((1,1),input_shape=(224,224, 3)))
    model.add(Convolution2D(64, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(Convolution2D(4096, (7, 7), activation='relu'))
    model.add(Dropout(0.5))
    model.add(Convolution2D(4096, (1, 1), activation='relu'))
    model.add(Dropout(0.5))
    model.add(Convolution2D(2622, (1, 1)))
    model.add(Flatten())
    model.add(Activation('softmax'))
    return model
In [45]:
#create pretrained VGG16 model with weights
model = vgg_face()
model.load_weights('/content/drive/My Drive/GL_CV/conf/vgg_face_weights.h5')
In [46]:
#disregard the classifier outputs as we are interested to only generate embedding vectors
vgg_face_desc =  Model(inputs=model.layers[0].input, outputs=model.layers[-2].output)
vgg_face_desc.summary()
Model: "functional_38"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ input_layer_1 (InputLayer)           │ (None, 224, 224, 3)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d (ZeroPadding2D)       │ (None, 226, 226, 3)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_9 (Conv2D)                    │ (None, 224, 224, 64)        │           1,792 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_1 (ZeroPadding2D)     │ (None, 226, 226, 64)        │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_10 (Conv2D)                   │ (None, 224, 224, 64)        │          36,928 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d (MaxPooling2D)         │ (None, 112, 112, 64)        │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_2 (ZeroPadding2D)     │ (None, 114, 114, 64)        │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_11 (Conv2D)                   │ (None, 112, 112, 128)       │          73,856 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_3 (ZeroPadding2D)     │ (None, 114, 114, 128)       │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_12 (Conv2D)                   │ (None, 112, 112, 128)       │         147,584 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_1 (MaxPooling2D)       │ (None, 56, 56, 128)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_4 (ZeroPadding2D)     │ (None, 58, 58, 128)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_13 (Conv2D)                   │ (None, 56, 56, 256)         │         295,168 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_5 (ZeroPadding2D)     │ (None, 58, 58, 256)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_14 (Conv2D)                   │ (None, 56, 56, 256)         │         590,080 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_6 (ZeroPadding2D)     │ (None, 58, 58, 256)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_15 (Conv2D)                   │ (None, 56, 56, 256)         │         590,080 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_2 (MaxPooling2D)       │ (None, 28, 28, 256)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_7 (ZeroPadding2D)     │ (None, 30, 30, 256)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_16 (Conv2D)                   │ (None, 28, 28, 512)         │       1,180,160 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_8 (ZeroPadding2D)     │ (None, 30, 30, 512)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_17 (Conv2D)                   │ (None, 28, 28, 512)         │       2,359,808 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_9 (ZeroPadding2D)     │ (None, 30, 30, 512)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_18 (Conv2D)                   │ (None, 28, 28, 512)         │       2,359,808 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_3 (MaxPooling2D)       │ (None, 14, 14, 512)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_10 (ZeroPadding2D)    │ (None, 16, 16, 512)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_19 (Conv2D)                   │ (None, 14, 14, 512)         │       2,359,808 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_11 (ZeroPadding2D)    │ (None, 16, 16, 512)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_20 (Conv2D)                   │ (None, 14, 14, 512)         │       2,359,808 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ zero_padding2d_12 (ZeroPadding2D)    │ (None, 16, 16, 512)         │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_21 (Conv2D)                   │ (None, 14, 14, 512)         │       2,359,808 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_4 (MaxPooling2D)       │ (None, 7, 7, 512)           │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_22 (Conv2D)                   │ (None, 1, 1, 4096)          │     102,764,544 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout (Dropout)                    │ (None, 1, 1, 4096)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_23 (Conv2D)                   │ (None, 1, 1, 4096)          │      16,781,312 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_1 (Dropout)                  │ (None, 1, 1, 4096)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_24 (Conv2D)                   │ (None, 1, 1, 2622)          │      10,742,334 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten (Flatten)                    │ (None, 2622)                │               0 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 145,002,878 (553.14 MB)
 Trainable params: 145,002,878 (553.14 MB)
 Non-trainable params: 0 (0.00 B)

Generate Embedding Vectors¶

In [47]:
# Flag to either calculate all embedding vectors or preload vectors from saved file.
# Workaround to save time as this process takes 10 to 15 minutes
# Set to True if embedding needs to be generated
GEN_EMBED = False
In [48]:
def get_embedding_vector(img_path):

    img = cv2.imread(img_path, 1)
    img = img[...,::-1]
    #img = load_image(img_path)
    img = (img / 255.).astype(np.float32)
    img = cv2.resize(img, dsize = (VGG_IMG_WIDTH,VGG_IMG_HEIGHT))
    embedding_vector = vgg_face_desc.predict(np.expand_dims(img, axis=0),verbose=0)[0]
    return embedding_vector

if GEN_EMBED == True:

      embeddings = np.zeros((metadata.shape[0], 2622))
      for i, m in enumerate(metadata):
          img_path = metadata[i].image_path()
          embeddings[i]=get_embedding_vector(img_path)

      np.save('/content/drive/My Drive/GL_CV/data/embeddings.npy',embeddings,allow_pickle=True)

else:

      with open('/content/drive/My Drive/GL_CV/data/embeddings.npy', 'rb') as f:
       embeddings = np.load(f)
In [49]:
embeddings.shape
Out[49]:
(10770, 2622)
In [50]:
embeddings[0].shape
Out[50]:
(2622,)

Helper method to calculate Squared L2 Distance¶

In [51]:
def vec_distance(vec1, vec2):

    '''
    Method to calculate squared L2 distance given 2 vectors


    '''
    return np.sum(np.square(vec1 - vec2))

Display similar images¶

In [52]:
LOW_THRESHOLD = 0.3 #vectors with distance under this will be considered similar


def find_similar_images(vec):

  '''
  Method to find similar images given a vector.
  Returns a list of (index,vectors) that have distances below the lower threshold
  as per distances increasing order

  '''

  dist_list = {}
  for i in range(len(embeddings)):
    vec2 = embeddings[i]
    dist_list[i] = vec_distance(vec1=vec, vec2=vec2)

  dist_list = [(k,v) for k,v in sorted(dist_list.items(),key=lambda item:item[1]) if v <= LOW_THRESHOLD]

  return dist_list


def plot_similar_img(index,):

  '''
  Method to plot similar images given an index from the image data.
  Plots the orginal image and 3 most similar images

  '''

  dist_list = find_similar_images(embeddings[index])

  fig,(ax1,ax2,ax3,ax4) = plt.subplots(1,4,figsize=(15,5))

  #plot original image
  ax1.imshow(cv2.imread(metadata[index].image_path())[...,::-1])
  ax1.set_title(f'Original Image:')

  #plot 3 similar image
  ax2.imshow(cv2.imread(metadata[dist_list[1][0]].image_path())[...,::-1])
  ax2.set_title(f'Distance: {vec_distance(embeddings[index],embeddings[dist_list[1][0]]):.4f}')
  ax3.imshow(cv2.imread(metadata[dist_list[2][0]].image_path())[...,::-1])
  ax3.set_title(f'Distance: {vec_distance(embeddings[index],embeddings[dist_list[2][0]]):.4f}')
  ax4.imshow(cv2.imread(metadata[dist_list[3][0]].image_path())[...,::-1])
  ax4.set_title(f'Distance: {vec_distance(embeddings[index],embeddings[dist_list[3][0]]):.4f}')

  ax1.axis('off')
  ax2.axis('off')
  ax3.axis('off')
  ax4.axis('off')
  plt.show()
In [53]:
#Plot images similar to image with index 100
plot_similar_img(100)
No description has been provided for this image
In [54]:
#Plot images similar to image with index 700
plot_similar_img(700)
No description has been provided for this image

Display dissimilar images¶

In [55]:
UP_THRESHOLD = 0.6 # vectors with distance above this will be considered dissimilar


def find_dissimilar_images(vec):

  '''
  Method to find dissimilar images given a vector.
  Returns a list of (index,vectors) that have distances above the upper threshold
  as per distances increasing order

  '''

  dist_list = {}
  for i in range(len(embeddings)):
      vec2 = embeddings[i]
      dist_list[i] = vec_distance(vec1=vec, vec2=vec2)

  dist_list = [(k,v) for k,v in sorted(dist_list.items(),key=lambda item:item[1]) if v >= UP_THRESHOLD]

  return dist_list


def plot_dissimilar_img(index,):

  '''
  Method to plot dissimilar images given an index from the image data.
  Plots the orginal image and 3 least dissimilar images i.e.
  these images have the least distance from the original image above the upper threshold

  '''

  dist_list = find_dissimilar_images(embeddings[index])

  fig,(ax1,ax2,ax3,ax4) = plt.subplots(1,4,figsize=(15,5))

  #Plot original image
  ax1.imshow(cv2.imread(metadata[index].image_path())[...,::-1])
  ax1.set_title(f'Original Image:')

  #plot 3 dissimilar images
  ax2.imshow(cv2.imread(metadata[dist_list[1][0]].image_path())[...,::-1])
  ax2.set_title(f'Distance: {vec_distance(embeddings[index],embeddings[dist_list[1][0]]):.4f}')
  ax3.imshow(cv2.imread(metadata[dist_list[2][0]].image_path())[...,::-1])
  ax3.set_title(f'Distance: {vec_distance(embeddings[index],embeddings[dist_list[2][0]]):.4f}')
  ax4.imshow(cv2.imread(metadata[dist_list[3][0]].image_path())[...,::-1])
  ax4.set_title(f'Distance: {vec_distance(embeddings[index],embeddings[dist_list[3][0]]):.4f}')


  ax1.axis('off')
  ax2.axis('off')
  ax3.axis('off')
  ax4.axis('off')
  plt.show()
In [56]:
#Plot images dissimilar to image with index 100
plot_dissimilar_img(100)
No description has been provided for this image
In [57]:
#Plot images dissimilar to image with index 300
plot_dissimilar_img(300)
No description has been provided for this image

PART III¶

Approach

  • Apply PCA on embeddings data
  • Train SVM Classifier on PCA data
  • Predict the labels for the two test images using above SVM

Split to train and test¶

In [58]:
X_train, X_test, y_train, y_test = train_test_split(embeddings, labels, test_size=0.2, random_state=42)
In [59]:
#LabelEncoding labels
label_encoder = LabelEncoder()
y_train_enc = label_encoder.fit_transform(y_train)
y_test_enc = label_encoder.transform(y_test)

Find right number of comoponents for PCA¶

In [60]:
from sklearn.decomposition import PCA
pca = PCA()
pca.fit(embeddings)
cumsum = np.cumsum(pca.explained_variance_ratio_)
In [61]:
#Plot the cumulative Explained variance and pick number of components with atleast 90% variance
plt.figure(figsize=(8,4))
plt.plot(cumsum)
plt.xlim(left=0,right=250)
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance')
plt.show()
No description has been provided for this image

Choose number of components in PCA as 130

In [62]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Apply PCA on the embedding vectors¶

In [63]:
pca = PCA(n_components = 130)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

SVM Classifier¶

In [64]:
from sklearn.svm import SVC

svm_clf = SVC(C=0.5, gamma=0.001)
svm_clf = SVC()
svm_clf.fit(X_train_pca, y_train_enc)
Out[64]:
SVC()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
SVC()

Evaluate SVC performance¶

In [65]:
from sklearn.metrics import accuracy_score

y_preds = svm_clf.predict(X_test_pca)
print(f'Accuracy on test set: {accuracy_score(y_test_enc,y_preds)}')
Accuracy on test set: 0.9591457753017641
In [66]:
y_train_preds = svm_clf.predict(X_train_pca)
print(f'Accuracy on train set: {accuracy_score(y_train_enc,y_train_preds)}')
Accuracy on train set: 0.9960538532961931

Use the trained SVM model to predict the labels of the test images.¶

In [67]:
def predict_image(img_path):

  '''
      Helper method to load image at path and predict its label using the trained SVM model
      Plot the image and predicted label
  '''


  img_vector = get_embedding_vector(img_path)
  img_vector = scaler.transform(img_vector.reshape(1,-1))
  img_vector = pca.transform(img_vector)

  label = svm_clf.predict(img_vector)

  plt.figure(figsize=(3,3))
  plt.imshow(cv2.imread(img_path)[...,::-1])
  pred_label = label_encoder.inverse_transform(label)
  plt.title(f'Image identified as {pred_label}')
  plt.axis('off')
In [68]:
predict_image("/content/drive/My Drive/GL_CV/data/Dwayne Johnson4.jpg")
No description has been provided for this image
In [69]:
predict_image("/content/drive/My Drive/GL_CV/data/Benedict Cumberbatch9.jpg")
No description has been provided for this image

Insights¶

A. Face Detection Model using ResNet

  • Model performance is satisfactory but can be improved by fine tuning on a larger dataset
  • Data augmentation may not be useful as still shots from movie are likely to have a regular orientation
  • More images may be needed with specific characters with pull-on masks/animated faces but not masked in truth masks, so model can stop recognising these as faces

B. Image Dataset using Cascade Classifier

  • Image dataset used only images with single faces, more varied images can be added
  • Current model works only on frontal face profiles. Images can also include faces with other profiles/angles - in which case a generalised model will need to be explored

C. Face Recognition System using SVM

  • Applying PCA on embedding vectors allows to minimise computation needs for the SVM classifier
  • SVM Classifier has a good accuracy of 95% on test data

Business Recommendations¶

  • Use a highly accurate face detection and recognition system on the movie streaming application. This will allow users to identify the users in a particular scene and lead to better user experience

  • Reach out to movie studios and production houses to gather good quality images periodically(monthly/quarterly). This will ensure availability of latest data for recent releases as well

  • Monitor model performance on a regular basis and update the model with the latest computer vision technology to ensure the best face detection and recognition solution is available for the viewers

  • Build a recommendation system to suggest other movies with the actors in the current scene have worked on. Additional information like actor trivia can also be included to give a competitive edge across other streaming platforms